Using paraphrases for improving first story detection in news and Twitter
نویسندگان
چکیده
First story detection (FSD) involves identifying first stories about events from a continuous stream of documents. A major problem in this task is the high degree of lexical variation in documents which makes it very difficult to detect stories that talk about the same event but expressed using different words. We suggest using paraphrases to alleviate this problem, making this the first work to use paraphrases for FSD. We show a novel way of integrating paraphrases with locality sensitive hashing (LSH) in order to obtain an efficient FSD system that can scale to very large datasets. Our system achieves state-of-the-art results on the first story detection task, beating both the best supervised and unsupervised systems. To test our approach on large data, we construct a corpus of events for Twitter, consisting of 50 million documents, and show that paraphrasing is also beneficial in this domain.
منابع مشابه
Streaming First Story Detection with application to Twitter
With the recent rise in popularity and size of social media, there is a growing need for systems that can extract useful information from this amount of data. We address the problem of detecting new events from a stream of Twitter posts. To make event detection feasible on web-scale corpora, we present an algorithm based on locality-sensitive hashing which is able overcome the limitations of tr...
متن کاملAcquiring Predicate Paraphrases from News Tweets
We present a simple method for evergrowing extraction of predicate paraphrases from news headlines in Twitter. Analysis of the output of ten weeks of collection shows that the accuracy of paraphrases with different support levels is estimated between 60-86%. We also demonstrate that our resource is to a large extent complementary to existing resources, providing many novel paraphrases. Our reso...
متن کاملBieber no more: First Story Detection using Twitter and Wikipedia
Twitter is a well known source of information regarding breaking news stories. This aspect of Twitter makes it ideal for identifying events as they happen. However, a key problem with Twitter-driven event detection approaches is that they produce many spurious events, i.e., events that are wrongly detected or simply are of no interest to anyone. In this paper, we examine whether Wikipedia (when...
متن کاملInsight4News: Connecting News to Relevant Social Conversations
We present the Insight4News system that connects news articles to social conversations, as echoed in microblogs such as Twitter. Insight4News tracks feeds from mainstream media, e.g., BBC, Irish Times, and extracts relevant topics that summarize the tweet activity around each article, recommends relevant hashtags, and presents complementary views and statistics on the tweet activity, related ne...
متن کاملExamination of Emergency Medicine Physicians’ and Residents’ Twitter Activities During the First Days of the COVID-19 Outbreak
Introduction: Social media has become an important element of interaction and found itself a place in every aspect of our lives. This study examined the twitter activities of emergency medicine physicians and residents (EMP&R;) about the COVID-19 outbreak. Methods: The study concentrated on Twitter, a major social media network. To identify accounts owned ...
متن کامل